Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
نویسنده
چکیده
Since facts or statements in a hedge or negated context typically appear as false positives, the proper handling of these language phenomena is of great importance in biomedical text mining. In this paper we demonstrate the importance of hedge classification experimentally in two real life scenarios, namely the ICD9-CM coding of radiology reports and gene name Entity Extraction from scientific texts. We analysed the major differences of speculative language in these tasks and developed a maxent-based solution for both the free text and scientific text processing tasks. Based on our results, we draw conclusions on the possible ways of tackling speculative language in biomedical texts.
منابع مشابه
Weakly Supervised Learning for Hedge Classification in Scientific Literature
We investigate automatic classification of speculative language (‘hedging’), in biomedical text using weakly supervised machine learning. Our contributions include a precise description of the task with annotation guidelines, analysis and discussion, a probabilistic weakly supervised learning model, and experimental evaluation of the methods presented. We show that hedge classification is feasi...
متن کاملExploiting Multi-Features to Detect Hedges and their Scope in Biomedical Texts
In this paper, we present a machine learning approach that detects hedge cues and their scope in biomedical texts. Identifying hedged information in texts is a kind of semantic filtering of texts and it is important since it could extract speculative information from factual information. In order to deal with the semantic analysis problem, various evidential features are proposed and integrated...
متن کاملExploiting Unlabeled Texts with Clustering-based Instance Selection for Medical Relation Classification
Classifying relations between pairs of medical concepts in clinical texts is a crucial task to acquire empirical evidence relevant to patient care. Due to limited labeled data and extremely unbalanced class distributions, medical relation classification systems struggle to achieve good performance on less common relation types, which capture valuable information that is important to identify. O...
متن کاملAn Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کاملHedge Scope Detection in Biomedical Texts: An Effective Dependency-Based Method
Hedge detection is used to distinguish uncertain information from facts, which is of essential importance in biomedical information extraction. The task of hedge detection is often divided into two subtasks: detecting uncertain cues and their linguistic scope. Hedge scope is a sequence of tokens including the hedge cue in a sentence. Previous hedge scope detection methods usually take all token...
متن کامل